Improving Imbalanced data classification accuracy by using Fuzzy Similarity Measure and subtractive clustering
نویسندگان
چکیده مقاله:
Classification is an one of the important parts of data mining and knowledge discovery. In most cases, the data that is utilized to used to training the clusters is not well distributed. This inappropriate distribution occurs when one class has a large number of samples but while the number of other class samples is naturally inherently low. In general, the methods of solving this kind of problem are divided into two categories: under-sampling and over-sampling. In this paper, an under-sampling method using subtractive clustering and fuzzy similarity measure was will be presented and their performances were are analyzed in terms of efficiency in classifying imbalanced data. For this purpose, the subtractive clustering is first conducted and the majority class data is clustered. Then, using fuzzy similarity measure, samples of each cluster were will be ranked and appropriate samples were are selected based on these rankings. The selected samples together with the minority class constituted create the final dataset. In this research, MATLAB software is used for implementation, the results are evaluation evaluated by using AUC criterion and analyzing results performed by using standard statistical. The experimental results show the effectiveness of proposed method to other methods of under-sampling.
منابع مشابه
On Mining Fuzzy Classification Rules for Imbalanced Data
Fuzzy rule-based classification system (FRBCS) is a popular machine learning technique for classification purposes. One of the major issues when applying it on imbalanced data sets is its biased to the majority class, such that, it performs poorly in respect to the minority class. However many cases the minority classes are more important than the majority ones. In this paper, we have extended ...
متن کاملOn Mining Fuzzy Classification Rules for Imbalanced Data
Fuzzy rule-based classification system (FRBCS) is a popular machine learning technique for classification purposes. One of the major issues when applying it on imbalanced data sets is its biased to the majority class, such that, it performs poorly in respect to the minority class. However many cases the minority classes are more important than the majority ones. In this paper, we have extended ...
متن کاملthe clustering and classification data mining techniques in insurance fraud detection:the case of iranian car insurance
با توجه به گسترش روز افزون تقلب در حوزه بیمه به خصوص در بخش بیمه اتومبیل و تبعات منفی آن برای شرکت های بیمه، به کارگیری روش های مناسب و کارآمد به منظور شناسایی و کشف تقلب در این حوزه امری ضروری است. درک الگوی موجود در داده های مربوط به مطالبات گزارش شده گذشته می تواند در کشف واقعی یا غیرواقعی بودن ادعای خسارت، مفید باشد. یکی از متداول ترین و پرکاربردترین راه های کشف الگوی داده ها استفاده از ر...
Hysteresis Modeling using Fuzzy Subtractive Clustering
This paper summarizes work undertaken in the area of modeling Shape Memory Alloy (SMA) and airfoil hysteresis using a Sugeno-type fuzzy modeling approach based on subtractive clustering. Two alternative approaches to develop a fuzzy model for hysteresis are proposed and evaluated. The first consists in building a mirror image of the lower curve in order to model both curves concurrently and the...
متن کاملon mining fuzzy classification rules for imbalanced data
fuzzy rule-based classification system (frbcs) is a popular machine learning technique for classification purposes. one of the major issues when applying it on imbalanced data sets is its biased to the majority class, such that, it performs poorly in respect to the minority class. however many cases the minority classes are more important than the majority ones. in this paper, we have extended ...
متن کاملمنابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ذخیره در منابع من قبلا به منابع من ذحیره شده{@ msg_add @}
عنوان ژورنال
دوره 19 شماره 2
صفحات 27- 38
تاریخ انتشار 2022-09
با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.
کلمات کلیدی برای این مقاله ارائه نشده است
میزبانی شده توسط پلتفرم ابری doprax.com
copyright © 2015-2023